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Abstract Body 



Background/context: 

While it has long been recognized that there is a clear and compelling need to foster young 
children’s science literacy (B. T. Bowman, 1999; New, 1999; Hecker, 2001), our P-16 
educational system is failing to provide students with a solid understanding of science and the 
capacity to succeed in tomorrow’s labor market. Students are not demonstrating rapid progress in 
science achievement (National Center for Education Statistics, 2001), particularly when 
compared with other countries (Martin et al., 2000). Furthermore, an achievement gap in science 
persists, with children of color, some children who are English language learners, and children of 
low-income backgrounds demonstrating lesser science proficiency than their peers (Haycock, 
Jerald, & Huang, 2001; National Center for Education Statistics, 2001; Robelen, 2002). 

To foster its future workforce’s science literacy, the United States needs to improve science 
education for all children, at every grade level (Haycock et al., 2001 ; National Center for 
Education Statistics, 2001; Nelson, 1999; Singham, 2003; Sutman, 2001). In David 
Hawkins ’(Hawkins, 1983) view, science can be “the great equalizer” when it is made more 
accessible by basing curriculum on everyday topics that are familiar to all children, regardless of 
their backgrounds. Because differences in children’s achievement are apparent before they enter 
kindergarten (Shonkoff & Phillips, 2000), realizing science’s potential for educational equity can 
bridge the achievement gap that begins in preschool. 

Because many early childhood teachers lack formal higher education (Barnett, 2003; 
Whitebook, 2003), professional development is key to assuring that teachers provide children 
with cognitively-challenging early learning experiences (Dwyer, Chait, & McKee, 2000; 
Espinosa, 2002; Helburn & Bergmann, 2002; B. Bowman et al., 2001). Yet, few models of 
professional development build teachers’ skills and knowledge in an ongoing way and provide 
access to higher education credits. Often, professional development consists of episodic 
workshops that do not reflect research-based knowledge about effective learning (Bransford, 
Brown, & Cocking, 1999; Darling-Hammond, 1996; Gallagher & Clifford, 2000; Hyson, 2001; 
Miller, Lord, & Dorney, 1994; Morgan et al., 1993) or build on teachers’ current practice 
(Darling-Hammond, 1996; Morgan et al., 1993). Without ongoing feedback and content-focused 
mentoring, it is difficult for teachers to sustain changes in practice (Caruso & Fawcett, 1999; 
Darling-Hammond, 1996; Garet, Porter, Desimone, Birman, & Yoon, 2001). 

Over the past three years, our team at Education Development Center, Inc. (EDC) has been 
researching a professional development program in science, Foundations of Science Literacy 
( FSL ), for preschool lead and assistant Head Start teachers in Massachusetts and Rhode Island. 
Year 1 was a pilot year, so it is the data from Year 2 on which we report in this paper. 
Foundations of Science Literacy is designed to respond to the urgent call to prepare preschoolers 
for tomorrow and respond to Hawkins’s eloquent plea for equity. 

Purpose/objective/research question/focus of study: 

Our research is designed to answer two important questions germane to this paper: 1) Does 
FSL impact Head Start teachers’ practices in inquiry-based science instruction for four-year-old 
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children? 2) Does FSL impact Head Start children’s early science knowledge and skills? The 
objective of the presentation will be to report the principal teacher-, classroom-, and child-level 
findings from our Year 2 implementation, and to discuss these findings in the context of what 
makes for a successful professional development program in early science. 

Setting: 

The research took place in the metro-west and southern sections of Massachusetts. 

Population/Participants/Subjects: 

Working with five Head Start programs in Massachusetts, we recruited lead and assistant 
teachers from 50 classrooms to participate in the study. Within each program 60% of the 
classrooms were randomly assigned to the FSL intervention group and 40% to the control group. 
A description of the recruitment and analytic samples can be found in Table 1. (Please insert 
table 1 here). 

Intervention/Program/Practice: 

FSL has two main components: 1) instructional sessions that are conducted face-to-face and 
designed to build teachers’ content knowledge in specific concepts in physical science and 
enhance their ability to teach science to young children; and 2) a mentoring component that 
provides coaching support to teachers as they master science content and implement inquiry- 
based science methods. Based on our experience training early childhood teachers, we deliver 
FSL over a six-month period. We know that it is essential to expand the timeframe for 
coursework beyond that typically allotted by institutions of higher education. Doing so paces the 
learning experience, allowing teachers to digest and apply new material while continuing to meet 
their job obligations (Dickinson & Brady, 2004). Initial sessions concentrate more of the 
instructional content on teaching teachers science content using an inquiry-based approach. As 
the program progresses, sessions focus more on the content and pedagogy appropriate for young 
children. 

In addition, FSL includes a set of three key design features. First, teachers learn best when 
they see examples of the practices they are adopting. Videotape exemplars, coupled with teacher 
commentary, build teachers’ capacity to analyze and reanalyze the effectiveness of practices in 
light of children’s responses. Powerful vehicles for showing teacher-child and child-to-child 
interaction, they demonstrate the complex interactions among instruction, assessment, and 
children’s learning. As “pictures of practice,” they demystify how to introduce investigations, 
conduct rich discussions, and identify and work with children’s naive theories. They also build 
teachers’ ability to engage in focused, professional dialogues. Second, young children express 
their science understandings and questions through conversations, drawings, narratives, and play. 
Yet, many teachers are not aware of the assessment opportunities that these sources of data 
provide. In FSL, we provide teachers with children’s work samples that illustrate a range of 
understanding and a diversity of modes of expression. Such samples provide teachers with the 
experience they need to assess children’s learning and prepare responsive curriculum activities 
that challenge children’s thinking. Using work from their classrooms helps teachers move from 
“abstract” analysis, in which the children and classroom are unknowns, to “authentic” analysis in 
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which the hypotheses they generate can be tested and reported on. Third, performance tasks that 
elicit what teachers know and are able to do help guide teachers’ mastery of key concepts and 
strategies and assist course architects and instructors in evaluating the impact of teaching and 
learning events (Brady & Chalufour, 2004). In FSL, assignments are carefully sequenced to build 
a bridge between instructional sessions and teachers’ classrooms. Participants are required to 
carry out application activities, set goals to improve their practice, and analyze the effectiveness 
of their teaching in terms of children’s science learning, development, and engagement. All 
assignments center on children’s work and/or videotapes of teachers’ practices to provide direct 
evidence of classroom practices that allow us to evaluate teachers’ learning. 

Research Design: 

Recruitment of Head Start teachers and assistant teachers was conducted in the summer and 
fall of 2006. We worked with program directors in Massachusetts to recruit teachers in their 
respective programs and centers. Our recruitment yielded 50 classrooms and 66 teachers (50 
lead teachers and 16 assistant teachers). Teachers and assistant teachers interested in 
participating in the project, across the programs, were randomly assigned to one of two 
conditions: FSL Intervention and Control. We stratified by Head Start program, and classrooms 
were randomly assigned to one of the conditions. Furthermore, the randomized sample was not 
balanced for numbers of classrooms in the intervention and control groups (Myers & Dynarski, 
2003). In particular, 60% of the classrooms were assigned to the intervention group, and 40% 
were assigned to the control group. This design is often preferable as it potentially maximizes 
cost effectiveness, increases statistical power, and limits the number of individuals who 
potentially will not benefit from the intervention (Puma et al., 2001). Moreover, an imbalanced 
sample "reduces the precision of the impact estimates by just 2%" (p. 33) when they employ a 
60:40 ratio (Puma et al., 2001). 

Data Collection and Analysis: 

Because no existing instruments were available at the inception of this project to measure 
teachers’ content knowledge, the quality of classroom science instruction, or young children’s 
performance -based knowledge, we created three tools to measure these constructs. These newly 
developed instruments are described below. In addition, we also assessed global classroom 
quality using the Early Childhood Environmental Rating Scale-Revised ( ECERS-R ). 

The Science Teaching and Environment Obserx’ation Rating Scale ( STERS ) is a classroom 
observation tool consisting of a framework for classroom observation and a teacher interview. 
The observation framework consists of five items corresponding to dimensions of quality science 
instruction in preschool classrooms: 1) Create a Physical Environment for Inquiry and Learning; 
2) Facilitate Direct Experiences to Promote Conceptual Learning; 3) Promote Use of Scientific 
Inquiry; 4) Plan In-depth Investigations; 5) Assess Children’s Learning. Each of the items is 
rated on a 4-point rating scale where “1” corresponds to Inadequate and “4” corresponds to 
Exemplary. The internal consistency of the STERS is high (Chronbach’s alpha=.96). Six data 
collectors were trained by EDC staff to collect fall data and eight were trained in the spring. 
Training included an introduction to the STERS, and instructions for scoring each dimension 
based on classroom observation and teacher interview. Each data collector was accompanied by 
the trainer during the first classroom observation. Both the trainer and the data collector observed 
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and scored the same classroom. Their scores were calibrated to ensure reliable scoring and 
consistency across data collectors. 

To assess teacher’s content and pedagogical knowledge, we created a set of four Science 
Teacher Performance Tasks ( TPTs ). These tasks were designed to assess a teacher’s ability to: 
plan science curriculum, including both content and inquiry components (Planning a Science 
Experience); evaluate a child’s science understanding based on his representation (Interpreting a 
Child’s Work Sample) and his behavior during an exploration of water (Analyzing 
Misconceptions); and analyze teacher facilitation of a science exploration (Analysis of Science 
Teaching). We assessed teacher’s performances based on their written analysis or explanation in 
response to a common prompt (e.g., video, child’s work sample). Each of the responses are rated 
on a 4 -point scale where “1” corresponds to Little or no evidence of knowledge and “4” 
corresponds to Clear, consistent, and convincing evidence of knowledge. The TPTs,' reliability is 
.82 as measured by Chronbach ’s alpha. 

As part of this effort, we also developed the Preschool Assessment of Science (PAS) — a 
performance -based instrument aimed at uncovering how 4-year-olds think about matter and the 
forces that act on matter. For example, what do young children think about the way that water 
naturally flows, or about how the direction and rate of flow may be changed by acting on the 
water (e.g., by using a squeeze bottle to expel water forcefully). Or, how far do they think a ball 
will travel after it rolls down a ramp, and do they think that the distance depends on the weight of 
the ball and the slope of the ramp? The PAS is organized in three main tasks: Water Flow ( WF ), 
Marbles & Ramps (MR), and Floating & Sinking (FS). Based on a reliability analysis, we 
separated the items in the Floating & Sinking task into two scales — one involving definitions and 
explanations (FS ver bai) and the other involving predictions in a sorting activity (F5 SO rting). The 
other two PAS tasks were each represented by a separate scale. 

For data analysis, we constructed a pair of regression models for each measure by first 
regressing the post-test score (obtained in spring) on the baseline pre-test score (obtained in fall), 
and then adding in the predictor of Group (FSL vs. control). Models were built using standard 
OLS for teacher/classroom measures and hierarchical linear modeling (HEM) for child measures. 
During the model building process, we also examined the impact of additional predictors as 
control or moderator variables. For each model, we present coefficients, standard errors, and 
indices of effect size, including R " and 8 (c)is defined as the ratio between the regression 
coefficient for Group and the standard error of the outcome; Fiu, Spybrook, Congdon, Martinez, 
& Raudenbush, 2006). 

Findings/Results: 

Teacher Outcomes. Group (FSL vs. control) was a significant predictor of spring Teacher 
Performance Tasks (TPTs) [4(53) = 6.44, p < .001, R 2 = .34, 8= 1.75]. On average, FSL teachers 
scored 0.7 points higher than control teachers on the TPTs (for which scores range from 1-4). 
(Please insert table 2 here). 

Classroom Outcomes. FSL classrooms showed stronger outcomes than control classrooms 
on both the ECERS-R and STERS. On the ECERS-R, we found statistically significant outcome 
differences between FSL and control classrooms on the Language-Reasoning (LR) subscale 
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0(39) = 2.1 l,p < .05, R 2 = .09, 3= 0.68], with ratings (which range from 1-7) averaging 0.8 
points higher in FSL than in control classrooms. On the STERS, group differences were even 
more pronounced, reflecting the close alignment of the STERS with the focus of the FSL 
intervention on supporting science teaching and learning. In particular, STERS ratings (which 
range from 1 -4) were 1 .6 points higher in FSL than in control classrooms, a highly significant 
difference [7(39) = 9.43, /;> < .001, R 2 - .63, 3= 3.00]. Interestingly, statistically significant 
correlations were found between TPT scores and classroom ratings [r = .36, p < .05 for ECERS- 
LR and r - .57, p < .05 for STERS], which is evidence that the TPT could be used as a proxy for 
classroom practice. (Please insert table 3 here). 

Child Outcomes. Statistically significant positive effects for FSL were found for two blocks 
of PAS items having content that was most heavily emphasized in the intervention. In particular, 
on the WF block involving how force affects the direction and rate of water flow, containing nine 
items (scored 0-9 overall), FSL children averaged 0.6 points higher than did control children 
[7(43) = 2.36, p < .05, R 2 = .08, 3= .37]. Similarly, FSL children scored 0.4 points higher than 
control children on the principal MR block [7(43) = 2.01, p = .05, R 2 = .04, 3 = .28], which 
includes six items (scored 0-6 overall) on how the speed of a ball rolling down a ramp, and the 
distance it travels, can be altered by changes in the slope of the ramp. FSL children also tended to 
have higher scores than control children in spring on the FS scales, but these effects were not 
statistically significant [3= .20 for FS verbc ,i and 3= .09 for FS sorting \. (Please insert table 4 here). 

Figure 1 summarizes the important pattern of results on the PAS scales — namely, that 
students in FSL classrooms tended to show greater improvement in PAS scores compared with 
students in control classrooms. This pattern remained evident after controlling for other baseline 
child variables (e.g., gender, ELL status, and baseline performance on standardized measures). 
Furthermore, there was also evidence that improvements in classroom practices from fall to 
spring were associated with improvements in children’s performance on the PAS. For example, 
controlling for fall scores, spring STERS scores were positively correlated with spring classroom 
mean WF [r = .33, p < .05)] and FS ver bai [r = -33, p < .05] scores. (Please insert figure 1 here) 

Conclusions: 

Our results indicate that FSL had a strong impact on teachers’ science knowledge and 
classroom practices in inquiry -based science instruction, and that children in FSL classrooms 
showed a trend towards greater improvement in their understanding of basic physical science 
principles and their use of science inquiry skills. In addition, our findings have lead to specific 
revisions in FSL, including the use of more constrained classroom assignments, allowing for a 
better alignment between FSL and PAS content, and a stronger emphasis on the growth of 
children’s reflective capacity in the context of FSL. 

We conclude that successful development of professional development programs requires 
evaluation at every level in the “causal chain” that connects teaching and learning — from 
teacher’s content knowledge, to their ability to apply it in their classrooms, to children’s ability 
to engage in focused science activities with genuine conceptual content. 
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Appendix B. Tables and Figures 

Table 1 



Recruitment and Analytic Samples for FSL Pilot Study 





Recruitment Sample 


Analytic Sample 1 




# 


# 


# 


# 


# 


# 




classrooms 


teachers 


children 


classrooms 


teachers 


children 


FSL group 


32 


42 


279 


26 


33 


208 


Control group 


18 


24 


191 


17 


23 


130 




50 


66 


470 


43 


56 


338 



'Defined as number of cases with data available on at least one post-test outcome measure 
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Regression Models Examining the Effect of Group on Teacher Performance Task Outcomes 
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Model 2 
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0.52 
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0.562 


MSE 


0.314 


0.179 



p < .01, *p < .05; B = unstandardized regression coefficient; b = standardized coefficient; MSE = 

2 oo o 

=AR7(1-AR~), where A R~ = R~ Mo dei i-R Model 2- 
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Regression Models Examining the Effect of Group on Classroom Outcomes 





Model 1 


Model 2 
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p < .0 1 , p < .05; B = unstandardized regression coefficient; b = standardized coefficient; MSE = 

/ 2 2 2 2 2 

=AR /(1-AR ), where A R~ = R ~ Mo dei i-R~Modei 2 - 
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Table 4 



HLM Regression Models Examining the Effect of Group on PAS Child Outcomes 



PAS 

Scale 


Effect 


Model 1 

SpringPAS jk = B ok + B^FaUPAS + r jk 
Bo k = Y oo + 


Model 2 

SpringPAS jk = B ok + B i FallPAS + r jk 
Bqu = Yoo + Y o i FSL + u Qk 


B 


Se 


t 


df 


B 


Se 


t 


df 


8 


Water 

Flow 


Too 


4.494** 


0.383 


11.73 


44 


4.251” 


0.429 


9.92 


43 




F 


0.576’* 


0.051 


11.32 


260 


0.571” 


0.051 


11.19 


259 




n i 










0.451 


0.357 


1.27 


43 


0.19 


Var(r jk ) 


5.421 


5.408 


Var(u 0k ) 


0.357 


0.358 


Water 
Flow 
-Block 2 


r 00 


3.204** 


0.233 


13.78 


44 


2.846” 


0.272 


10.46 


43 




B l 


0.461” 


0.053 


8.76 


260 


0.457” 


0.052 


8.77 


259 




To i 










0.598* 


0.253 


2.36 


43 


0.37 


Var(r jk ) 


2.371 


2.359 


Var(u 0k ) 


0.293” 


0.241* 


Marbles 

& 

Ramps 


Zoo 


4.189” 


0.231 


18.11 


44 


3.887 


0.275 


14.13 


43 




B l 


0.328” 


0.050 


6.56 


268 


0.330 


0.050 


6.64 


267 




r 0 1 










0.480 


0.241 


1.99 


43 


0.29 


Var( r jk ) 


2.508 


2.501 


Var(u 0k ) 


0.205* 


0.173* 


Marbles 

& 

Ramps 
-Block 1 


Too 


3.230” 


0.163 


19.86 


44 


3.000” 


0.197 


15.19 


43 




B \ 


0.326” 


0.047 


6.88 


268 


0.329” 


0.047 


6.98 


267 




r 0 1 










0.362* 


0.180 


2.01 


43 


0.28 


Var(r jk ) 


1.666 


1.656 


Var(u 0k ) 


0.069 


0.057 


Floating 

& 

Sinking 

-Verbal 


Too 


1.708” 


0.174 


9.82 


44 


1.506” 


0.233 


6.45 


43 




B t 


0.618” 


0.063 


9.74 


269 


0.615” 


0.063 


9.69 


268 




r 0 1 










0.331 


0.257 


1.29 


43 


0.20 


Var(r jk ) 


2.386 


2.383 


Var(u 0k ) 


0.268” 


0.263” 


Floating 

& 

Sinking 

-Sorting 


Too 


5.628” 


0.369 


15.27 


44 


5.500 


0.412 


13.34 


43 




B t 


0.166” 


0.055 


3.01 


269 


0.167 


0.055 


3.03 


268 




Yo i 










0.192 


0.275 


0.70 


43 


0.09 


Var( r jk ) 


4.707 


4.716 


VnKwoi ) 


0.007 


0.006 



p<.001, p < .01, *p < .05; B\= unstandardized regression coefficient; Ais defined as the ratio 
between the regression coefficient for FSL and the standard error of the outcome (Liu, Spybrook, 
Congdon, Martinez, & Raudenbush, 2006) . 6= yoi/Sqrt [ Var( r . k ) + Var( u 0k )] 
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Figure 1 . Average Improvement in PAS Scores from Fall to Spring 
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